IN THE UNITED STATES 
PATENT AND TRADEMARK OFFICE 
PATENT APPLICATION 



METHODS AND APPARATUS FOR ACCESSING AND 
PROCESSING MULTIMEDIA MESSAGES STORED IN 
A UNIFIED MULTIMEDIA MAILBOX 



BACKGROUND OF THE INVENTION 

10 

1 . Field of the Invention 

The invention relates to methods and apparatus 
for processing multimedia messages. More particularly, 
the invention relates to methods and appararus for 

15 (1) converting messages from one medium to another; 

(2) performing message content analysis; (3) utlizizing 
linguistically based analysis tools to identify message 
relationships regardless of media type; (4) interrelating 
messages according to content; and (5) providing a simple 

20 message reference capability to simplify message access. 



2 . Brief Description of the Prior Art 

Business people receive many different kinds of 
messages, e.g. electronic mail, voice mail, fax, video 

25 messages, attachments to electronic mail. It is possible 
and desirable to have all messages sent to a single mail 
box from which they may all be retrieved regardless of 
the message type. However, the only retrieval device 
which is capable of reading all of these different types 

30 of messages is a personal computer having a graphical 

display and audio video capability. Unfortunately, it is 
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not always possible or convenient to retrieve messages 
with a personal computer* 

A unified mailbox where all kinds of media 
5 (voice, fax, e-mail, and video) are made accessible 

and/or visible from virtually anywhere to a subscriber or 
user in one basket is a convenient means of communication 
when compared to handling multiple mailboxes with 
distinct media. Current solutions for a unified mailbox 

10 are inefficient, however, for someone with an intense 

communication style and a frequent need to handle his/her 
messages remotely. The mismatch of media type of the 
information and the capabilities of the various (often 
limited) devices used for remote access places a heavy 

15 burden on the user and the interface of the system. This 
is especially true for the interfaces utilizing a 
telephone with no display, or handheld devices with 
limited display capabilities. 

20 Some of the problems arise in the context of 

compound and/or lengthy messages in connection with one 
or the other access means. For example, it is not 
possible to deliver voice and fax messages to a text-only 
e-mail capable device. It is also difficult to deal with 

25 lengthy e-mails delivered to a voice-only interface or to 
a text-interface with limited capabilities. Even when 
the device has a fully functional GUI interface, there is 
room for increased efficiency with large amounts of data. 
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It is a challenge to efficiently present the information 
in various office document formats (e.g., Word Processor, 
Spreadsheet, and Presentations) associated with a 
message. It is often difficult to locate and visually 
present related messages and attachments. When the 
mailbox has many messages in it, it is difficult to 
reference the messages. 

Other problems arise due to the increased 
amount of information the unified mailbox can provide. 
Current mechanisms for organizing and presenting 
relationships among messages (listing by arrival time, 
subject, sender, etc.) are insufficient for a large 
number of messages of varying media and, especially, 
mixed media within a given message. 

It would be desirable to provide a flexible, 
media independent way of finding and navigating related 
messages. With current systems, for example, the user is 
unable to recognize that there is a relationship between 
a voice message and a fax without listening to the voice 
message and displaying/printing the fax. 

Because the presentation of unified mailbox 
information is more complex, especially if relationships 
as described hereinabove are incorporated into the 
presentation, identifying an individual item (message or 
message attachment) for further action can become 



problematic. How does the client/user identify to the 
server which message is to be acted upon? Are the entire 
message and its attachments to be involved? Is it a 
single attachment or only the original message body? And 
5 if the messages are presented in a "graph" format, haw 
does the user select an individual item? 

Current unified mailbox systems offer media 
sensitivity for message retrieval only when accessed with 

10 a graphical user interface (GUI) from a PC client. If a 
particular media or office document is attached to an 
e-mail, the user needs to click-on it in order to launch 
a specific application, for example, an audio player for 
voice, tiff-viewer for fax, video player to view a video 

15 message, etc. 

For users with intense communication 
requirements (e.g. executives or customer service agents 
who receive hundreds of compound messages daily) there 
20 are no means to quickly process inbox messages except by 
the sender information, the subject line, and maybe few 
lines of the message body. In order to read messages, 
the user has to click on or mark a certain item in a 
graphical interface in order to get to the message body. 

25 

No content summarization of lengthy text 
messages or respective attachments is available yet that 
would remarkably improve the efficiency of handling the 
daily information avalanche in the office. 
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Current mailbox searching does not provide 
visual display of content and temporal relationships. No 
search capability exists yet for non-text messages. 

If a unified mailbox is accessed from a 
telephone interface, voice and e-mail messages are 
retrievable and the user can listen to both. Existing 
text-to-speech technology provides a means to convert the 
e-mail to voice. A fax message can be forwarded to a fax 
machine or printer. 

However, if an e-mail contains an attachment, 
the systems are able to indicate that, but are unable to 
access its content. Similarly, the contents of a fax or 
other documents attached to an e-mail are indicated but 
not accessible to the user accessing the mailbox with a 
telephone interface , 

If an e-mail is lengthy, the user may be able 
to navigate through it by accelerating the text-to-speech 
reading speed. However, there is no means of text 
content summarization applied to shorten the process 
without eventually losing/skipping critical content. 



If messages are forwarded to a handheld device 
via a wireless service but the device has limited 
text-display capabilities only certain parts of the email 
5 (From, Subject and a limited number of characters of the 
message body) can be displayed. If the critical 
information in the message is not in the beginning of the 
message body that is displayed, it is "lost" to the 
recipient. He/she has to use other access methods or 
10 make a call into the messaging system/server to retrieve 
the full text message (by listening to it or by 
initiating a printing to a device nearby) . 

As mentioned above, voice and other media 
15 attachments are indicated but not transmitted and/or 

displayed on a text-only display. The user needs to use 
other access methods to retrieve the messages. 
Additionally, no text content summarization methods are 
utilized to deal with access device technology 
20 limitations. 

Full message sensitivity is only provided when 
accessing a mailbox with a multimedia PC. However even 
multimedia PCs lack any means to summarize message 
25 content in order to make it more efficient for the 

recipient to read his/her lengthy messages. Also, there 
are yet no means to summarize content of attached 
documents . 
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When accessing a mailbox with a telephone, the 
media and device sensitivity is limited to voice and 
e-mail. Again, no techniques of text content 
5 summarization are applied yet in order to make the 

retrieval of the message information over the phone more 
convenient. 

In the case of handheld or mobile devices with 
10 limited text-display capabilities, the problem is that 
lengthy messages are usually not transmitted in their 
entirety by the wireless/paging service providers. 
Additionally, any other media attachments are "lost". No 
content summarization of lengthy text messages or 
15 respective attachments is available yet that would 

remarkably improve the efficiency of handling the daily 
information avalanche in the office. 
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SUMMARY OF THE INVENTION 



It is therefore an object of the invention to 
provide methods and apparatus for accessing multimedia 
messages from a unified mailbox. 

It is also an object of the invention to 
provide methods and apparatus for converting media types 
in a unified multimedia mailbox. 

It is another object of the invention to 
provide methods and apparatus for summarizing the content 
of messages in a unified multimedia mailbox. 

It is yet another object of the invention to 
provide methods and apparatus for cross referencing 
related messages based on content. 

It is another object of the invention to 
provide methods and apparatus for improved handling of 
email attachments , 

It is still another object of the invention to 
provide methods and apparatus for customizing mail 
handling based on a system profile adapted to the device 
used to access the mailbox. 



In accord with these objects which will be 
discussed in detail below the apparatus and associated 
methods of the invention include a mail server that 
provides multimedia message inbox for one or several 
5 users on a network; a subsystem that detects media 

attachments to messages in a mailbox; a subsystem that 
converts media attachments into another media type using 
text-to-speech , fax-to-text, video voice track into text 
and speech-to-text; a subsystem that analyzes and 

10 summarizes the text content of original or converted 

media in respect of the linguistic meaning; a subsystem 
that delivers appropriate media according to an access 
device and message purpose, as defined in a profile; a 
subsystem that identifies cross-media interrelationships 

15 between messages and controls the media conversions 
necessary for this analysis; and a subsystem that 
controls a reference number scheme. 

The methods and apparatus of the invention 
20 solve the problems discussed above by utilizing advanced 
media conversion methods , analysis and summarization of 
message content, and intelligent forwarding concepts. It 
provides access device and media sensitive intelligence 
for a mailbox when retrieving or forwarding a particular 
25 message. 
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The concept of media conversion is extended beyond 
text-to-speech to other attachments ; a 

speaker-independent, large vocabulary, telephony-quality 
5 speech recognition engine is utilized to convert a voice 
message to text or to convert the voice track of a video 
attachment into readable text. Similarly, fax 
information is converted into text. 

10 According to the invention, the content of 

messages is automatically summarized. The summarization 
of a message content is an improvement toward efficiency, 
particularly in the case of a forwarded lengthy message 
to a handheld device with limited display capabilities. 

15 The same is true for reading a lengthy message over the 
phone. Summarization is also applied to attached media 
(e.g. fax, Word document) extends even the media content 

aCCeSSibic . 

20 Both, the media conversion and the content 

summarization applied together provide compatibility with 
the access device. Depending on the user, the types of 
potential access devices are usually predefined; 
therefore messages along with their attachments that form 

25 the message content can be tailored to those devices 

while accessed or forwarded according to a profile. This 
ensures the availability of more information to the 
recipient at the device of choice and that is probably 
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most convenient. Still, if the user requires more 
information, he/she can utilize another access method. 

The invention also provides cross-media 
5 searching and visual displaying. Often messages related 
to a specific topic of interest to the user are in 
different media and spread throughout the message store 
(e.g. in different folders) . The cross-media search 
finds these messages and presents them to the user in a 
10 way that makes the content and time relationships clear 
allowing efficient use of the otherwise overwhelming 
amount of information. The search engine utilizes 
sophisticated linguistically based analysis tools to 
discover the message relationships. 

15 

Additionally, a reference number scheme for all 
messages is provided. All messages in a particular group 
of messages of interest to the user are assigned a 
reference number to be used in further actions. Thus a 

20 PDA user can, for example, get a summary of messages with 
reference numbers and an indication of the message type. 
This reference number may then be used to access that 
message, and through it, a particular attachment to that 
message for further. Voice commands may be used to 

25 invoke actions on items more efficiently using the 
reference numbers of messages. 
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BRIEF DESCRIPTION OF THE DRAWINGS 



FIG. 1 is a high level block diagram of a 
multimedia mail system according to the invention. 

5 



DETAILED DESCRIPTION 

Turning now to Figure 1, an integrated 
10 multimedia messaging system according to the invention 
includes a mail server 10 that provides multimedia 
message inbox for one or several users on a network; a 
mail processor 11; a subsystem 12 that detects media 
attachments to messages in a mailbox; one or more 
15 subsystems that converts media attachments into another 
media type using text-to-speech 14, fax-to-text 16, video 
voice track into text 18 and speech-to-text 20 a 
subsystem 22 that analyzes and summarizes the text 
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20 linguistic meaning; a subsystem 24 that delivers 

appropriate media according to an access device and 
message purpose, as defined in a profile; a subsystem 26 
that identifies cross-media interrelationships between 
messages and controls the media conversions necessary for 

25 this analysis; and a subsystem 28 that controls a 
reference number scheme. 
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The invention can better be understood through 
an illustrative example such as the notification of a 
single-media voice message to a data pager. The 
5 following describes an example of this process involving 
a user that has a multimedia mailbox and a data pager who 
receives a voice message. 

The problem is to provide the "best" 
10 information to the pager so the user can proceed most 
efficiently. What is the "best" information will vary 
according to the user's actual preferences, but will most 
likely include sender identification and meaningful 
portions of the message itself. In addition, there are 
15 probably messages the user would prefer to delay any 
handling of until an appropriate device is available. 
Thus the steps for sending voice messages to a pager 
would include: a) filtering messages to be processed, 
b) speech-to-text conversion, c) summarization and post 
20 filtering, and d) selection and delivery of text 
information to the device. 

Since the resources involved in processing a 
message may be large, messages are pre-f iltered. 
25 Speech-to-Text is "expensive" in its use of resources. 
Interrupting the user with any but the most important 
messages can be an unnecessary expense of the user's time 
and attention as well as a waste of system resources. 



13 



Thus a mechanism to prevent the presentation of a message 
to a given device is important. This filtering is based 
on a variety of data including sender, message priority, 
etc, and the criteria for filtering is stored in the 
5 system profile for the user. 



Voice messages which pass through the pre~ 
filter are converted to text. This is most efficiently 
accomplished on the server side, perhaps with a dedicated 
10 "helper" server explicitly for the server so as not to 
disturb other processing on the server. The resulting 
text message is then be associated with the original 
message (as the text message body or as a separate 
attachment) , 

15 

Before sending the text message to the pager, 
it is subjected to post-conversion filtering and 
summarization. Post-conversion filtering is optional, 
preventing processing of messages that appear not to be 
20 on a topic deemed important to the user. If it does not 
appear important, it would then remain in the mailbox to 
be processed. If the message survives the post- 
conversion filtering step, the text is then summarized. 

25 Most simply, summarization includes reduction 

to a list of keywords and phrases found within the text. 
The summarization is created by removing from the message 
words/phrases not found within the user-defined list of 



14 



keywords/phrases. More complex summarization includes 
allowing the user to specify the keyword/phrase list 
based on the sender of the message. 



5 Since the message is a speech-to-text 

conversion, the keywords and their homonyms should be 
checked. An option on the summarization, for example a 
check box that says ** allow homonyms 1 1 , could be utilized 
to enable this feature. 

0 

Even more complex summarization methods 
contemplated by the invention involve performing 
sophisticated grammatical parsing and analysis. 



15 Data is transmitted to the pager based on a 

user defined data selection criteria which is stored as a 
template in the system profile for the user. The data 
available for selection includes sender same, time, 
summary, message priority, un-summarized text, and other 

20 fields as available. 



The user describes a template that indicates 
the information desired and the number of 
characters of each field desired. Por example: 

25 

^ From %SENDER% at %TIME%: %100SUMMARY% 1 1 



indicates that the user wants a string that includes the 
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entire sender name, the received time and the first 100 
characters of the summary to appear on his pager. 

When the user receives the page, the summary 
5 information gives him/her enough information to determine 
how critical the message is. If it appears critical, 
he/she may choose to access the entire message using a 
different device, e.g. a telephone. 

10 Another example is the retrieval of text 

messages (such as email) via a telephone. Text messages 
are pre-filtered as described above. The text is then 
summarized. The summary is then converted to speech 
which is played on the telephone to the user calling in 

15 for messages. 

Still another example is sending a fax message 
to a PDA. Fax messages are pre-filtered based on sender 
and priority. The fax messages which pass through the 
20 filter are converted to text with OCR (optical character 
recognition) software. The text is summarized. Data is 
selected using a user defined template. The text message 
is sent tot he PDA and the user is ^notified". 

25 In general, a user can define a "morphing 

process 1 1 for messages in the context of any particular 
target device such as a pager or a cell phone with a 
limited display. 
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The morphing process is a combination of 
message filtering, message restructuring, data 
conversion, data summarization, data selection and 
5 notification steps that are configured to handle 

particular media types for particular target devices. 
Each user may define a set of rules and parameters for 
each device type defining how messages are morphed. 

10 For example, a user may have a Voice 

Message-to-Pager morph definition that would do the 
following: 

(a) filter messages based on sender and priority, 
15 removing from further processing (i.e. leaving 

on the server) messages that are not deemed 
urgent enough to disturb the user while out of 
the office; 

20 (b) perform speech-to-text conversion; 

(c) summarize the text based on criteria defined by 
the user; 

25 (d) perform further filtering based on the 

summarized/converted text; 

(e) organize the text in a template; and 
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(f) send the message to the pager. 

In general, a morphing process will include 
these steps in some order determined by the user. In 
5 addition, message restructuring steps allow the user to 
handle multiple attachments of varying media attached to 
the message. For example, the user may select that a 
summary of the attachments be created (attachment name 
and media type) or may request that the attachments be 
10 expanded, converted and summarized as described above for 
the single media message. 

There have been described and illustrated 
herein methods and apparatus for processing multimedia 

15 messages. While particular embodiments of the invention 
have been described, it is not intended that the 
invention be limited thereto, as it is intended that the 
invention be as broad in scope as the art will allow and 
that the specification be read likewise. It will 

20 therefore be appreciated by those skilled in the art that 
yet other modifications could be made to the provided 
invention without deviating from its spirit and scope as 
so claimed. 
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